Goto

Collaborating Authors

 binding task



Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context

Gur-Arieh, Yoav, Geva, Mor, Geiger, Atticus

arXiv.org Artificial Intelligence

A key component of in-context reasoning is the ability of language models (LMs) to bind entities for later retrieval. For example, an LM might represent "Ann loves pie" by binding "Ann" to "pie", allowing it to later retrieve "Ann" when asked "Who loves pie?" Prior research on short lists of bound entities found strong evidence that LMs implement such retrieval via a positional mechanism, where "Ann" is retrieved based on its position in context. In this work, we find that this mechanism generalizes poorly to more complex settings; as the number of bound entities in context increases, the positional mechanism becomes noisy and unreliable in middle positions. To compensate for this, we find that LMs supplement the positional mechanism with a lexical mechanism (retrieving "Ann" using its bound counterpart "pie") and a reflexive mechanism (retrieving "Ann" through a direct pointer). Through extensive experiments on nine models and ten binding tasks, we uncover a consistent pattern in how LMs mix these mechanisms to drive model behavior. We leverage these insights to develop a causal model combining all three mechanisms that estimates next token distributions with 95% agreement. Finally, we show that our model generalizes to substantially longer inputs of open-ended text interleaved with entity groups, further demonstrating the robustness of our findings in more natural settings. Overall, our study establishes a more complete picture of how LMs bind and retrieve entities in-context.


Working Memory for Online Memory Binding Tasks: A Hybrid Model

Yazdi, Seyed Mohammad Mahdi Heidarpoor, Abbassian, Abdolhossein

arXiv.org Artificial Intelligence

Working Memory is the brain module that holds and manipulates information online. In this work, we design a hybrid model in which a simple feed-forward network is coupled to a balanced random network via a read-write vector called the interface vector. First, we consider some simple memory binding tasks in which the output is set to be a copy of the given input and a selective sequence of previous inputs online. Next, we design a more complex binding task based on a cue that encodes binding relations. The important result is that our dual-component model of working memory shows good performance with learning restricted to the feed-forward component only. Here we take advantage of the random network property without learning. To our knowledge, this is the first time that random networks as a flexible memory is shown to play an important role in online binding tasks. We may interpret our results as a candidate model of working memory in which the feed-forward network learns to interact with the temporary storage random network as an attentional-controlling executive system.